Keyword Extraction From Chinese Text Based On Multidimensional Weighted Features
نویسنده
چکیده
This paper proposed to solve the problems of incomplete coverage and low accuracy in keyword extraction of Chinese text based on intrinsic feature of the Chinese language and an extraction method of multidimensional information weighted eigenvalues. This method combined theoretical analysis and experimental calculation to study the parts of speech, word position, word length, semantic similarity and word co-occurrence frequency in Chinese texts. By combining multidimensional data related to word frequency, word feature values, word similarity and word co-occurrence probability, we calculated that the weighted eigenvalues obtained by comparing precision rate, recall rate and F measure and concluded that the proposed method can give a better measure of the word accuracy than using word frequency or the basic eigenvalue methods alone. The conclusions obtained in this study provide reference values for keyword extraction and text mining. Subject Categories and Descriptors I.2.7 [Artificial intelligence]: Natural Language Processing Text analysis; H.2.8 [Database Applications]: Data mining; General Terms: Chinese text mining, sentiment analysis
منابع مشابه
Exploring Multidimensional Continuous Feature Space to Extract Relevant Words
With growing amounts of text data the descriptive metadata become more crucial in efficient processing of it. One kind of such metadata are keywords, which we can encounter e.g. in everyday browsing of webpages. Such metadata can be of benefit in various scenarios, such as web search or contentbased recommendation. We research keyword extraction problem from the perspective of vector space and ...
متن کاملMicro-blog Keyword Extraction Method Based on Graph Model and Semantic Space
There have been many domain-specific keyword extraction researches, but micro-blogoriented keyword extraction is just beginning. This paper researches into the keyword extraction from Chinese micro-blog. Taking the characteristics of micro-blog into account, such as short, topic divergence, etc., we propose a Chinese micro-blog keyword extraction method based on the combination of multi feature...
متن کاملJoint Learning of Chinese Words, Terms and Keywords
Previous work often used a pipelined framework where Chinese word segmentation is followed by term extraction and keyword extraction. Such framework suffers from error propagation and is unable to leverage information in later modules for prior components. In this paper, we propose a four-level Dirichlet Process based model (DP-4) to jointly learn the word distributions from the corpus, domain ...
متن کاملImproving Precision of Keywords Extracted From Persian Text Using Word2Vec Algorithm
Keywords can present the main concepts of the text without human intervention according to the model. Keywords are important vocabulary words that describe the text and play a very important role in accurate and fast understanding of the content. The purpose of extracting keywords is to identify the subject of the text and the main content of the text in the shortest time. Keyword extraction pl...
متن کاملرویکردی با ناظر در استخراج واژگان کلیدی اسناد فارسی با استفاده از زنجیرههای لغوی
Keywords are the main focal points of interest within a text, which intends to represent the principal concepts outlined in the document. Determining the keywords using traditional methods is a time consuming process and requires specialized knowledge of the subject. For the purposes of indexing the vast expanse of electronic documents, it is important to automate the keyword extraction task. S...
متن کامل